(19)
Europaisches Patentamt
European Patent Office
Office europeen des brevets
III
(11)
EPT$76 427 A2
(12)
EUROPEAN PATENT APPLICATION
(43) Date of publication:
02.01.2004 Bulletin 2004/01
(51) Int CI 7 : G06F 17/60
(21)
Application number: 03006814.2
(22)
Date of filing: 26.03.2003
(84)
Designated Contracting States:
(72)
Inventors:
AT BE BG CH CY CZ DE DK EE ES Fl FR GB GR
•
Goodman, Joshua Theodore
HU IE IT LI LU MC NL PT RO SE SI SK TR
Redmond, Washington 98052 (US)
Designated Extension States:
•
Rounthwaite, Robert L.
AL LT LV MK
Fall City, Washington 98024 (US)
(30)
Priority. 26.06.2002 US 180565
(74)
Representative: Grunecker, Kinkeldey,
Stockmair & Schwanhausser Anwaltssozietat
(71)
Applicant: MICROSOFT CORPORATION
Maximilianstrasse 58
Redmond, Washington 98052-6399 (US)
80538 Munchen (DE)
(54) SPAM detector with challenges
(57) A system and method facilitating detection of
unsolicited e-mail message(s) with challenges is provid-
ed. The invention includes an e-mail component and a
challenge component. The system can receive e-mail
message(s) and associated probabilities that the e-mail
message(s) are spam. Based, at least in part, upon the
associated probability, the system can send a challenge
to a sender of an e-mail message. The challenge can
be an embedded code, computational challenge, hu-
man challenge and/or micropayment request. Based, at
least in part, upon a response to the challenge (or lack
of response), the challenge component can modify the
associated probability and/or delete the e-mail mes-
sage.
E-MAIL COMPONENT
CHALLENGE
COMPONENT
100
110
CHALLENGE(S)
CM
<
CM
CD
CO
120
RESPONSE(S) TO
CHALLENGE(S)
FIG. 1
Q_
LU
Printed by Jouve, 75001 PARIS (FR)
BNSDOCID: <EP
1376427A2_I_>
*
1 EP 1 37i
Description
TECHNICAL FIELD
[0001] The present invention relates generally to elec-
tronic mail (e-mail) and more particularly to a system
and method employing unsolicited e-mail (spam) detec-
tion with challenges.
BACKGROUND OF THE INVENTION
[0002] Electronic messaging, particularly electronic
mail ("e-mail") carried over the Internet, is rapidly be-
coming not only pervasive in society but also, given its
informality, ease of use and low cost, a preferred mode
of communication for many individuals and organiza-
tions.
[0003] Unfortunately, as has occurred with more tra-
ditional forms of communication (e.g., postal mail and
telephone), e^Thaii recipients are increasingly being sub-
jected to unsolicited mass mailings. With the explosion,
particularly in the last few years, of Internet-based com-
merce, a wide and growing variety of electronic mer-
chandisers is repeatedly sending u nsolicited mail adver-
tising their products and services to an ever expanding
universe of e-mail recipients. Most consumers that order
products or otherwise transact with a merchant over the
Internet expect to and, in fact, regularly receive such
merchant solicitations. However, electronic mailers are
continually expanding their distribution lists to penetrate
deeper into society in order to reach ever increasing
numbers of recipients. For example, recipients who
merely provide their e-mail addresses in response to
perhaps innocuous appearing requests for visitor infor-
mation generated by various web sites, often find, later
upon receipt of unsolicited mail and much to their dis-
pleasure, that they have been included on electronic dis-
tribution lists. This occurs without the knowledge, let
alone the assent, of the recipients. Moreover, as with
postal direct mail lists, an electronic mailer will often dis-
seminate its distribution list, whether by sale, lease or
otherwise, to another such mailer, and so forth with sub-
sequent mailers. Consequently, over time, e-mail recip-
ients often find themselves barraged by unsolicited mail
resulting from separate distribution lists maintained by
a wide and increasing variety of mass mailers. Though
certain avenues exist, based on mutual cooperation
throughout the direct mail industry, through which an in-
dividual can request that his(her) name be removed
from most direct mail postal lists, no such mechanism
exists among electronic mailers.
[0004] Once a recipient finds him(her)self on an elec-
tronic mailing list, that individual can not readily, if at all,
remove his(her) address from it, thus effectively guar-
anteeing that he(she) will continue to receive unsolicited
mail - often in increasing amounts from that list and of-
tentimes other lists as well. This occurs simply because
the sender either prevents a recipient of a message from
i 427 A2 , 2
identifyingthe sender of that m^g^a^e (such as by send,- _
ing mail through a proxy server) and hence precludes
the recipient from contacting the sender in an attempt
to be excluded from a distribution list, or simply ignores
> any request previously received from the recipient to be
so excluded.
[0005] An individual can easily receive hundreds of
unsolicited postal mail messages over the course of a
year, or less. By contrast, given the ease and insignifi-
o cant cost through which e-distrjbution lists can be readily
exchanged and e-mail messages disseminated across
large numbers of addressees, a single e-mail addressee
included on several distribution lists can expect to re-
ceive a considerably larger number of unsolicited mes-
f5 sages over a much shorter period of time. Furthermore,
while many unsolicited e-mail messages (e.g., offers for
discount office or computer supplies or invitations to at-
tend conferences of one type or another) are benign;
others, such as pornographic, inflammatory and abu-
20 sive material, can be highly offensive to certain recipi-
ents.
[0006] Unsolicited e-mail messages are commonly
referred to as "spam". Similarto the task of handling junk
postal mail, an e-mail recipient must sift through his(her)
25 incoming mail to remove spam. Unfortunately, the
choice of whether a given e-mail message is spam or
not is highly dependent on the particular recipient and
content of the message - what may be spam to one re-
cipient may not be so to another. Frequently, an elec-
30 tronic mailer will prepare a message such that its true
content is not apparent from its subject line and can only
be discerned from reading the body of the message.
Hence, the recipient often has the unenviable task of
reading through each and every message he(she) re-
35 ceives on any given day, rather than just scanning its
subject line, to fully remove spam messages. Needless
to say, such filtering (often manually-based) can be a
laborious, time-consuming task.
[0007] In an effort to automate the task of detecting
40 abusive newsgroup messages (so-called "flames"), the
art teaches an approach of classifying newsgroup mes-
sages through a rule-based text classifier. See, E. Sper-
tus "Smokey: Automatic Recognition of Hostile Messag-
es", Proceedings of the Conference on Innovative Ap-
45 plications in Artificial Intelligence (IAAI) , 1 997. Here, se-
mantic and syntactic textual classification features are
first determined by feeding an appropriate corpus of
newsgroup messages, as a training set, through a prob-
abilistic decision tree generator. Given handcrafted
50 classifications of each of these messages as being a
"flame" or not, the generator delineates specific textual
features that, if present or not in a message, can predict
whether, as a rule, the message is a flame or not. Those
featu res that correctly predict the nature of the message
55 with a sufficiently high probability are then selected for
subsequent use. Thereafter, to classify an incoming
message, each sentence in that message is processed . _
to yield a multi-element (e.g., 47 element) feature vector,
2
BNSDOCID: <EP.
1 376427 A2_l_>
EP 1 376 427 A2
with each element simply signifying the presence or ab-
sence of a different feature in that sentence. The feature
vectors of all sentences in the message are then
summed to yield a message feature vector (for the entire
message). The message feature vector is then evaluat- 5
ed through corresponding rules produced by the deci-
sion tree generator to assess, given a combination and
number of features that are present or not in the entire
message, whether that message is either a flame or not.
For example, as one semantic feature, the author no- 10
ticed that phrases having the word "you" modified by a
certain noun phrase, such as "you people", "you bozos",
"you flamers", tend to be insulting. An exception is the
phrase "you guys" which, in use, is rarely insulting.
Therefore, one feature is whether any of these former is
word phrases exist. The associated rule is that, if such
a phrase exists, the sentence is insulting and the mes-
sage is a flame. Another feature is the presence of the
word "thank", "please" or phrasal constructs having the
word "would*(as in: "Would you be willing to e-mail me 20
your logo") but not the words "no thanks". If any such
phrases or words are present (with the exception of "no
thanks"), an associated rule, which the author refers to
as the "politeness rule" categorizes the message as po-
lite and hence not a flame. With some exceptions, the 25
rules used in this approach are not site-specific, that is,
for the most part they use the same features and operate
in the same manner regardless of the addressee being
mailed.
[0008] A rule based textual e-mail classifier here spe- 30
cifically one involving learned "keyword-spotting rules",
is described in W. W. Cohen, "Learning Rules that Clas-
sify E-mail", 1996 AAAI Spring Symposium on Machine
Learning in Information Access, 1996 (hereinafter the
"Cohen" publication). In this approach, a set of e-mail 35
messages previously classified into different categories
is provided as input to the system. Rules are then
learned from this set in order to classify incoming e-mail
messages into the various categories. While this meth-
od does involve a learning component that allows for 40
automatic generation of rules : these rules simply make
yes/no distinctions for classification of e-mail messages
into different categories without providing any confi-
dence measure for a given prediction. Moreover, in this
work, the actual problem of spam detection was not ad- 45
dressed. In this regard, rule-based classifiers suffer var-
ious serious deficiencies which, in practice, would se-
verely limit their use in spam detection. First, existing
spam detection systems require users to manually con-
struct appropriate rules to distinguish between legiti- 50
mate mail and spam. Most recipients will not bother to
undertake such laborious tasks. As noted above, an as-
sessment of whether a particular e-mail message is
spam or not can be rather subjective with its recipient.
What is spam to one recipient may, for another, not be. 55
Furthermore, non-spam mail varies significantly from
person to person. Therefore, for a rule based-classifier
to exhibit acceptable performance in filtering most spam
from an incoming mail strea m Jhje recipient mustj:on-
struct and program a set of classification rules that ac-
curately distinguishes between what constitutes spam
and what constitutes non-spam (legitimate) e-mail.
Properly doing so can be an exjtremely complex, tedious
and time-consuming task even for a highly experienced
and knowledgeable computer user.
[0009] Second, the characteristics of spam and non-
spam e-mail may change significantly over time; rule-
based classifiers are static (unless the user is constantly
willing to make changes to the rules). Accordingly, mass
e-mail senders routinely modify content of their messag-
es in a continual attempt to prevent ("outwit") recipients
from initially recognizing these messages as spam and
then discarding those messages without fully reading
them. Thus, unless a recipient is willing to continually
construct new rules or update existing rules to track
changes to spam (as that recipient perceives such
changes), then, over time, a rule-based classifier be-
comes increasingly inaccurate at distinguishing spam
from desired (non-spam) e-mail for that recipient, there-
by further diminishing utility of the classifier and frustrat-
ing the user/recipient.
[0010] Alternatively, a user might consider employing
a method for learning rules (as in the Cohen publication)
from their existing spam in order to adapt, overtime, to
changes in an incoming e-mail stream. Here, the prob-
lems of a rule-based approach are more clearly high-
lighted. Rules are based on logical expressions; hence,
as noted above, rules simply yield yes/no distinctions
regarding the classification for a given e-mail message.
Problematically, such rules provide no level of confi-
dence for their predictions.
Inasmuch as users may have various tolerances as to
how aggressive they would want to filter their e-mail to
remove spam, then, in an application such as detecting
spam, rule-based classification would become rather
problematic. For example, a conservative user may re-
quire that the system be very confident that a message
is spam before discarding it, whereas another user
many not be so cautious. Such varying degrees of user
precaution cannot be easily incorporated into a rule-
based system such as that described in the Cohen pub-
lication.
SUMMARY OF THE INVENTION
[0011] The following presents a simplified summary
of the invention in order to provide a basic understand-
ing of some aspects of the invention. This summary is
not an extensive overview of the invention. It is not in-
tended to identify key/critical elements of the invention
or to delineate the scope of the invention. Its sole pur-
pose is to present some concepts of the invention in a
simplified form as a prelude to the more detailed de-
scription that is presented later.
[0012] The present invention provides for a system for
detection of unsolicited messages {e.g... e-mail). The
3
BNSDOCID: <EP 1 376427 A2_l_>
EP 1 376 427 A2
system includes an e-mail component and a challenge
component. The system can receive message(s) and
associated probabilities that the message(s) are spam.
Based, at least in part, upon the associated probability
the system can send a challenge to a sender of a mes-
sage. The e-mail component can store message(s) and
associated probabilities that the messages are spam. In
one example, e-mail message(s) are stored with differ-
ent attributes, such as folder name, based on associat-
ed probabilities that the email message(s) are spam. In
another example, e-mail message(s) having associated
probabilities less than or equal to a first threshold are
stored in a legitimate e- mail folder while e-mail mes-
sage^) having associated probabilities greater than the
first threshold are stored in a spam folder. In yet another
implementation of the invention, e- mail message(s)
having associated probabilities less than or equal to a
first threshold are stored in a legitimate e-mail folder, e-
mail message(s) having associated probabilities greater
than the firstlhreshold, but less than or equal to a sec-
ond threshold are stored in a questionable spam folder.
Those e-mail message(s) having associated probabili-
ties greater than the second threshold are stored in a
spam folder. It is to be appreciated that the first threshold
and/or the second threshold can be fixed, based on user
preference(s) and/or adaptive (e.g., based, at least in
part, upon available computational resources).
[0013] It will be appreciated that numbers other than
probabilities, such as the score from a Support Vector
Machine, a neural network, etc. can serve the same pur-
pose as probabilities - in general, the numeric output of
any machine learning algorithm can be used in place of
a probability in accordance with an aspect of the present
invention. Similarly, some machine learning algorithms,
such as decision trees, output categorical information,
and this too can be used in place of a probability com-
bined with a threshold.
[0014] The challenge component can send a chal-
lenge to a sender of an e-mail message having an as-
sociated probability greater than a first threshold. For
example, the challenge can be based, at least in part,
upon a code embedded within the challenge (e.g., al-
phanumeric code). In responding to the challenge, the
sender of the e-mail can reply with the code. In one ex-
ample, the sender's system can be adapted to automat-
ically retrieve the embedded code and respond to the
challenge. Alternatively and/or additionally, the sender
can be prompted to respond to the challenge (e.g., man-
ually).
The use of a challenge based on an embedded code
can increase the bandwidth and/or computational load
of sendcr(s) of spam, thus, serving as a deterrent to
sending of spam. It is to be appreciated that the chal-
lenge can be any of a variety of suitable types (e.g., com-
putational challenge, a human challenge and/or a mi-
cropayment request). The challenge can be fixed and/
or variable. For example, with an increased associated
probability, the challenge component can send a more
♦
difficult challenge or onejhatjgayjres a greater micro.- _
payment.
[0015] The challenge component can modify the as-
sociated probability that the e-mail message is spam
5 based, at least in part , upon a response to the challenge. —
For example, upon receipt of an appropriate (e.g., cor-
rect) response to the challenge, the challenge compo-
nent can decrease the associated probability that the e-
mait message is spam. In one example, the e-mail mes-
10 sage is moved from a spam folder to a legitimate e-mail
folder. In another implementation, the e-mail message
is moved from a questionable spam folder to a legitimate
e-mail folder. Upon receipt of an inappropriate (e.g., in-
correct) response to the challenge and/or failure to re-
ts ceive a response to the challenge in a particular time
period (e.g., 4 hours), the challenge component can in-
crease the associated probability that the e-mail mes-
sage is spam. For example, the e-mail message can be
moved from a questionable spam folder to a spam fold-
20 er.
[0016] Another aspect of the present invention pro-
vides for the system to further include a mail classifier.
The mail classifier receives e-mail message(s), deter-
mines the associated probability that the e-mail mes-
25 sage is spam and stores the e-mail message(s) and as-
sociated probabilities in the e-mail component. Accord-
ingly, the mail classifier analyzes message content for
a given recipient and distinguishes, based on that con-
tent and for that recipient, between spam and legitimate
30 (non-spam) messages and so classifies each incoming
e-mail message for that recipient.
[0017] Additionally and/or alternatively, e-mail mes-
sage^) can be marked with an indication of likelihood
(probability) that the message is spam; message(s) as-
35 signed intermediate probabilities of spam can be
moved; based on that likelihood, to questionable spam
folder(s). Based, at least in part, upon information pro-
vided by the mail classifier, the challenge component
can send a challenge to a sender of an e-mail message
40 having an associated probability greater than a first
threshold.
[0018] Yet another aspect of the present invention
provides for the system to further include spam folder
(s) and legitimate e-mail folder(s). The mail classifier de-
45 termines the associated probability that an e-mail mes-
sage is spam and stores the e-mail message in the
spam folder(s) or the legitimate e-mail folder(s) (e.g.,
based on a first threshold). Incoming e-mail message(s)
are applied to an input of the mail classifier, which, in
so turn, probabilistically classifies each of these messages
as either legitimate or spam. Based on its classification,
the message is routed to either of the spam folder(s) or
the legitimate e- mail folder(s). Thereafter, the challenge
component can send a challenge to a sender of an e-
55 mail message stored in the spam folder(s) (e.g., having
an associated probability greater than the first thresh-
old). Based, at least in part, upon a response to the chal-
lenge, the challenge component can move the e-mail
4
BNSDOCID- <EP 1 376427 A2_l_>
EP 1 376 427 A2
8
message from the spam folder(s) to the legitimate e-mail
folder(s). For example, upon receipt of an appropriate
(e.g., correct) response to the challenge, the challenge
component can move the e-mail message from the
spam folder(s) to the legitimate e-mail folder(s). Further-
more, upon receipt of an inappropriate (e.g., incorrect)
response to the challenge and/or failure to receive a re-
sponse to the challenge in a particular time period (e.g.,
4 hours), the challenge component can delete the e-mail
message from the spam folder(s) and/or change at-
tribute^) of the e-mail message stored in the spam fold-
er(s).
[0019] Another aspect of the present invention pro-
vides for a system to further include a legitimate e-mail
scnder(s) store and/or a spam sender(s) store. The le-
gitimate e-mail sender(s) store stores information (e.g.,
e-mail address) associated with sender(s) of legitimate
e-mail. E-mail message(s) from sender(s) identified in
the legitimate e-mail sender(s) store are generally not
challenged rjy the challenge component. Information (e.
g.. e-mail address(es)) can be stored in the legitimate
e-mail sender(s) store based on user selection (e.g., "do
not challenge" particular sender command), a user's ad-
dress book, address(es) to which a user has sent at
least a specified number of e-mail messages and/or by
the challenge component. The legitimate e-mail sender
(s) store can further store a confidence level associated
with a sender of legitimate e-mail. E-mail message(s)
having associated probabilities less than or equal to the
associated confidence level of the sender are not chal-
lenged by the challenge component while those e-mail
message(s) having associated probabilities greater
than the associated confidence level are challenged by
the challenge component. The spam sender(s) store
stores information (e.g., e-mail address) associated with
a sender of spam. Information can be stored in the spam
sender(s) store by a user and/or by the challenge com-
ponent.
[0020] To the accomplishment of the foregoing and re-
lated ends, certain illustrative aspects of the invention
are described herein in connection with the following de-
scription and the annexed drawings. These aspects are
indicative, however, of but a few of the various ways in
which the principles of the invention may be employed
and the present invention is intended to include all such
aspects and their equivalents. Other advantages and
novel features of the invention may become apparent
from the following detailed description of the invention
when considered in conjunction with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021]
Fig. 1 is a block diagram of a system for detection
of unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 2 is a block diagram of a system for detection
10
15
20
25
of unsolicited e-mail in acjoj^ance with an aspect
of the present invention.
Fig. 3 is a block diagram of a system for detection
of unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 4 is a block diagram of a system for detection
of unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 5 is a block diagram of a system for detection
of unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 6 is a block diagram of a system for detection
of unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 7 is a block diagram of a system for responding
to a challenge in accordance with an aspect of the
present invention.
Fig. 8 is a flow chart illustrating a method for detect-
ing unsolicited e-mail in accordance with an aspect
of the present invention.
Fig. 9 is a flow chart further illustrating the method
of Fig. 8.
Fig. 10 is a flow chart illustrating a method for re-
sponding to a challenge in accordance with an as-
pect of the present invention.
Fig. 11 is a flow chart illustrating a method for re-
sponding to challenges in accordance with an as-
pect of the present invention.
Fig. 12 is an exemplary user interface for respond-
ing to a plurality of challenges in accordance with
an aspect of the present invention.
Fig. 13 illustrates an example operating environ-
ment in which the present invention may function.
35 DETAILED DESCRIPTION OF THE INVENTION
[0022] The present invention is now described with
reference to the drawings, wherein like reference nu-
merals are used to refer to like elements throughout. In
the following description, for purposes of explanation,
numerous specific details are set forth in order to pro-
vide a thorough understanding of the present invention.
It may be evident, however, that the present invention
may be practiced without these specific details. In other
instances, well-known structures and devices are
shown in block diagram form in order to facilitate de-
scribing the present invention.
[0023] As used in this application, the term "computer
component" is intended to refer to a computer- related
entity, either hardware, a combination of hardware and
software, software, or software in execution. For exam-
ple, a computer component may be, but is not limited to
being, a process running on a processor, a processor,
an object, an executable, a thread of execution, a pro-
gram, and/or a computer. By way of illustration, both an
application running on a server and the server can be a
computer component. One or more computer compo-
nents may reside within a process and/or thread of ex-
30
40
45
50
55
5
BNSDOCID: <EP
1 376427 A2J_>
EP 1 376 427 A2
10
ecution and a component may be localized on one com-
puter and/or distributed between two or more comput-
ers.
[0024] Referring to Fig. 1 , a system 1 00 for detection
of unsolicited messages (e.g., e- mail) in accordance
with an aspect of the present invention is illustrated. The
system 100 includes an e-mail component 110 and a
challenge component 1 20. The system 1 00 can receive
e-mail message(s) and associated probabilities that the
e-mail message(s) are spam. Based, at least in part, up-
on the associated probability the system 100 can send
a challenge to a sender of an e-mail message.
[0025] The e-mail component 110 receives and/or
stores e-mail message(s) receives and/or computes as-
sociated probabilities that the e-mail messages are
spam. For example, the e-mail component 1 1 0 can store
information based, at least in part, upon information re-
ceived from a mail classifier (not shown). In one exam-
ple, e-mail message(s) are stored in the e-mail compo-
nent 110 based on associated probabilities that the
email message(s) are spam. In another example, the e-
mail component 110 receives e-mail message(s) and
computes associated probabilities that the e-mail mes-
sage^) are spam.
[0026] The challenge component 120 can send a
challenge to a sender of an e-mail message having an
associated probability greaterthan a first threshold. For
example, the challenge can be based, at least in part,
upon a code embedded within the challenge (e.g., al-
phanumeric code). In responding to the challenge, the
sender of the e-mail can reply with the code. In one ex-
ample, the sender's system (not shown) can be adapted
to automatically retrieve the embedded code and re-
spond to the challenge. Alternatively and/or additionally,
the sender can be prompted to respond to the challenge
(e.g., manually). The use of a challenge based on an
embedded code can increase the bandwidth and/or
computational load of sender(s) of spam, thus, serving
as a deterrent to the sending of spam.
[0027] Additionally and/or alternatively the challenge
can be a computational challenge, a human challenge
and/or a micropayment request. These challenges and
responses to these challenges are discussed more fully
below. Further, the challenge can be fixed and/or varia-
ble. For example, with an increased associated proba-
bility, the challenge component 120 can send a more
difficult challenge or one that requires a greater micro-
payment.
[0028] For example, a micropayment request can op-
tionally utilize one-time-use spam certificates. A system
100 can put a "hold" on a received spam certificate.
When a user of the system 1 00 reads the message and
marks it as spam, the spam certificate is invalidated -
sender unable to use spam certificate any further. If the
message is not marked as spam, the hold is released
thus allowing the sender to reuse the spam certificate
(e.g., sender of message not charged money). In an al-
ternate implementation, the spam certificate is always
invalidated at receipt, rea,ardl§§s^>f whether the mes-
sage was marked as spam or not.
[0029] With regard to a computational challenge, in
one implementation a challenge sender (message re-
5 ceiver) can determine what thecomputational challenge^
should be. However, in another implementation, the
challenge is uniquely determined by some combination
of the message content., the time of receipt or sending
of the message, the message sender, and, importantly,
10 the message recipient. For example, the computational
challenge may be based on a one-way hash of these
quantities. If the challenge sender (message recipient)
is allowed to choose the challenge, than a spammer
might be able to use the following technique. He sub-
15 scribes to mailing lists or otherwise generates mail from
users. Thus, responders send messages back to the
spammer to which the spammer responds with a com-
putational challenge of his choice. In particular the
spammer can choose challenges that legitimate users
20 have previously sent to the spammer in response to
spam! Some percentage of the recipients of the spam-
mer's challenges solve the challenges, thus allowin g the
spammer to then answer the challenges sent to the
spammer. In one implementation, the computational
25 challenge is based on a one-way hash of the message
(including time and recipient stamps), making it virtually
impossible for sender or receiver to determine the chal-
lenge, but making it possible for each to verify that a
challenge serves its intended purpose.
30 [0030] The challenge component 1 20 can modify the
associated probability that the e-mail message is spam
based, at least in part, upon a response to the challenge.
For example, upon receipt of an appropriate (e.g., cor-
rect) response to the challenge, the challenge compo-
35 nent 120 can decrease the associated probability that
the e-mail message is spam. In one example, the e-mail
message is moved from a spam folder to a legitimate e-
mail folder. In another example, the e-mail message is
moved from a questionable spam folder to a legitimate
40 e-mail folder. Moreover, upon receipt of an inappropriate
(e.g., incorrect) response to the challenge and/or failure
to -receive a response to the challenge in a particular
time period (e.g., 4 hours), the challenge component
120 can increase the associated probability that the e-
45 mail message is spam.
[0031] In one implementation, a user is given a choice
of challenges. For example, the choice of challenges
can be based upon a filter.
[0032] Further instead of storing the e-mail message,
50 the system 100 can "bounce" the message, thus, ne-
cessitating the sender to resend the message along with
the response to the challenge.
[0033] While Fig. 1 is a block diagram illustrating com-
ponents for the system 100, it is to be appreciated that
55 the challenge component 120 can be implemented as
one or more computer components, as that term is de-
fined herein. Thus, it is to be appreciated that computer
executable components operable to implement the sys-
6
• <EP 1 376427 A2_l_>
t1
EP 1 376 427 A2
12
tern 100 and/or the challenge component 120 can be
stored on computer readable media including, but not
limited to., an ASIC (application specific integrated cir-
cuit), CD (compact disc), DVD (digital video disk), ROM
(read only memory), floppy disk : hard disk, EEPROM 5
(electrically erasable programmable read only memory)
and memory stick in accordance with the present inven-
tion.
[0034] Turning to Fig. 2 : a system 200 for detection of
unsolicited e-mail in accordance with an aspect of the 10
present invention is illustrated. The system 200 includes
an e-mail component 110, a challenge component 120
and a mail classifier 130. An exemplary mail classifier
1 30 is set forth in greater detail in copending U.S. Patent
Application entitled A TECHNIQUE WHICH UTILIZES 1$
A PROBABILISTIC CLASSIFIER TO DETECT "JUNK"
E-MAIL, having serial no. 09/102,837 the entirety of
which is hereby incorporated by reference. In one ex-
ample, the mail classifier 130 receives e-mail message
(s), determines the associated probability that the e-mail 20
message is spam and stores the e-mail message(s) and
associated probabilities in the e-mail component 110.
The mail classifier 130 analyzes message content for a
given recipient and distinguishes, based on that content
and for that recipient, between spam and legitimate 25
(non-spam) messages and so classifies each incoming
e-mail message for that recipient.
[0035] In another example, each incoming e-mail
message (in a message stream) is first analyzed to as-
sess which one(s) of a set of predefined features, par- 30
ticularly characteristic of spam, the message contains.
These features (e.g., the "feature set") include both sim-
ple-word-based features and handcrafted features, the
latter including, for example, special multi-word phrases
and various features in e-mail messages such as non- 35
word distinctions. Generally speaking, these non-word
distinctions collectively relate to, for example, format-
ting, authoring, delivery and/or communication at-
tributes that, when present in a message, tend to be in-
dicative of spam -- they are domain-specific character- 40
istics of spam. Illustratively, formatting attributes may in-
clude whether a predefined word in the text of a mes-
sage is capitalized, or whether that text contains a series
of predefined punctuation marks. Delivery attributes
may illustratively include whether a message contains 45
an address of a single recipient or addresses of a plu-
rality of recipients, or a time at which that message was
transmitted (mail sent in the middle of the night is more
likely to be spam). Authoring attributes may include, for
example, whether a message comes from a particular 50
e-mail address. Communication attributes can illustra-
tively include whether a message has an attachment (a
spam message rarely has an attachment), or whether
the message was sent by a sender having a particular
domain type (most spam appears to originate from ". 55
com" or ".net" domain types). Handcrafted features can
also include tokens or phrases known to be, for exam-
ple, abusive, pornographic or insulting; or certain punc-
tuation marks or groupings, suc j^as repeated exclama-
tion points or numbers, that are each likely to appear in
spam. The specific handcrafted features are typically
determined through human judgment alone or com-
bined with an empirical analysis of distinguishing at-
tributes of spam messages.
[0036] A feature vector, with one element for each fea-
ture in the set, is produced for each incoming e-mail
message. That element simply stores a binary value
specifying whether the corresponding feature is present
or not in that message. The vector can be stored in a
sparse format (e.g., a list of the positive features only).
The contents of the vector are applied as input to a prob-
abilistic classifier, preferably a modified support vector
machine (SVM) classifier, which, based on the features
that are present or absent from the message, generates
a probabilistic measure as to whether that message is
spam or not. This measure is then compared against a
preset threshold value. If, for any message, its associ-
ated probabilistic measure equals or exceeds the
threshold, then this message is classified as spam (e.
g., stored in a spam folder). Alternatively, if the probabi-
listic measure for this message is less than the thresh-
old, then the message is classified as legitimate (e.g.,
stored in a legitimate mail folder). The classification of
each message can also be stored as a separate field in
the vector for that message. The contents of the legiti-
mate mail folder can then be displayed by a client e-mail
program (not shown) for user selection and review. The
contents of the spam folder will. only be displayed by the
client e-mail program upon a specific user request.
[0037] Furthermore, the mail classifier 130 can be
trained using a set of M e-mail messages (e.g., a "train-
ing set", where M is an integer) that have each been
manually classified as either legitimate or spam. In par-
ticular, each of these messages is analyzed to deter-
mine from a relatively large universe of n possible fea-
tures (referred to herein as a "feature space"), including
both simple-word-based and handcrafted features, just
those particular N features (where n and N are both in-
tegers, n > N) that are to comprise the feature set for
use during subsequent classification. Specifically, a ma-
trix (typically sparse) containing the results for all n fea-
tures for the training set is reduced in size through ap-
plication of Zipf s Law and mutual information, both as
discussed in detail infratoXhe extent necessary, to yield
a reduced N-by-m feature matrix. The resulting N fea-
tures form the feature set that will be used during sub-
sequent classification. This matrix and the known clas-
sifications for each message in the training set are then
collectively applied to the mail classifier 130 for training
thereof.
[0038] Furthermore, should a recipient manually
move a message from one folder to another and hence
reclassify it, such as from being legitimate into spam,
the contents of either or both folders can be fed back as
a new training set to re-train and hence update the clas-
sifier. Such re-training can occur as a result of each mes-
BNSDOCID: <EP 1 376427 A2J_>
13
EP 1 376 427 A2
14
sage reclassification; automatically after a certain
number of messages have been reclassif ied; after a giv-
en usage interval (e.g., several weeks or months) has
elapsed: or upon user request. In this manner, the be-
havior of the classifier can advantageously track chang-
ing subjective perceptions and preferences of its partic-
ular user. Alternatively, e-mail messages may be clas-
sified into multiple categories (subclasses) of spam (e.
g.. commercial spam, pornographic spam and so forth).
In addition, messages may be classified into categories
corresponding to different degrees of spam (e.g., "cer-
tain spam", "questionable spam", and "non-spam").
[0039] Based, at least in part, upon information pro-
vided by the mail classifier 130, the challenge compo-
nent 1 20 can send a challenge to a sender of an e-mail
message having an associated probability greater than
a first threshold. For example, the challenge can be
based, at least in part, upon a code embedded within
the challenge (e.g., alphanumeric code). In responding
to the challenge, the sender of the e-mail can reply with
the code. The sender's system (not shown) can be
adapted to automatically retrieve the embedded code
and respond to the challenge. Alternatively and/or addi-
tionally, the sender can be prompted to respond to the
challenge (e.g., manually). The use of a challenge
based on an embedded code can increase the band-
width and/or computational load of sender(s) of spam,
thus, serving as a deterrent to the sending of spam. It is
to be appreciated that any type of challenge (e.g., a
computational challenge, a human challenge, a micro-
payment request) suitable for carrying out the present
invention can be employed and all such types of chal-
lenges are intended to fall within the scope of the hereto
appended claims.
[0040] The challenge component 1 20 can modify the
associated probability that an e-mail message is spam
based, at least in part, upon a response to the challenge.
For example, upon receipt of an appropriate (e.g., cor-
rect) response to the challenge, the challenge compo-
nent 120 can decrease the associated probability that
the e-mail message is spam.
[0041] Upon receipt of an inappropriate (e.g., incor-
rect) response to the challenge and/or failure to receive
a response to the challenge in a particular time period
(e.g., 4 hours), the challenge component 120 can in-
crease the associated probability that the e-mail mes-
sage is spam. It is to be appreciated that the mail clas-
sifier 130 can be a computer component as that term is
defined herein.
[0042] Referring next to Fig. 3, a system 300 for de-
tection of unsolicited e-maii in accordance with an as-
pect of the present invention is illustrated. The system
300 includes a mail classifier 310, a challenge compo-
nent 320, spam folder(s) 330 and legitimate e-maii fold-
ers) 340. In one implementation, the spam f older(s) 330
and/or the legitimate e-mail folder(s) 340 can be virtual,
that is, storing information associated with e-mail mes-
sage^) (e.g., link to e-mail message(s)) with the e-mail
message(s) stored elsewhere^rjp another implemep- _
tation, rather than folders, an attribute of the message,
can simply be set.
[0043] As discussed supra, the mail classifier 31 0 de-
5 termines the associated probability that an e-mail mes- —
sage is spam and stores the e-mail message in the
spam folder(s) 330 or the legitimate e-mail folder(s) 340
(e.g., based on a first threshold). Incoming e-mail mes-
sage^) are applied to an input of the mail classifier 31 0,
10 which, in turn, probabilistically. classifies each of these
messages as either legitimate or spam. Based on its
classification, the e-mail message is routed to either of
the spam folder(s) 330 or the legitimate e-mail foider(s)
340. Thus, e-mail message(s) having associated prob-
15 abilities less than or equal to a first threshold are stored
in a legitimate e-mail folder(s) 340 while e-mail message
(s) having associated probabilities greater than the first
threshold are stored in a spam folder(s) 330. The first
threshold can be fixed, based on user preference(s)
20 and/or adaptive (e.g., based, at least in part, upon avail-
able computational resources).
[0044] Thereafter, the challenge component 320 can
send a challenge to a sender of an e-mail message
stored in the spam folder(s) (e.g., having an associated
25 probability greaterthan the first threshold). For example,
the challenge can be based, at least in part, upon a code
embedded within the challenge, a computational chal-
lenge, a human challenge and/or a micropayment re-
quest. Based, at least in part, upon a response to the
30 challenge, the challenge component 320 can move the
e-mail message from the spam folder(s) 330 to the le-
gitimate e-mail folder(s) 340. For example, upon receipt
of an appropriate (e.g., correct) response to the chal-
lenge, the challenge component 320 can move the e-
35 mail message from the spam folder(s) 330 to the legiti-
mate e-mail folder(s) 340.
[0045] Upon receipt of an inappropriate (e.g., incor-
rect) response to the challenge and/or failure to receive
a response to the challenge in a particular time period
40 (e.g., 4 hours), the challenge component 320 can delete
the e-mail message from the spam folder(s) 330 and/or
change attribute(s) of the e-mail message stored in the
spam f older(s) 330. For example, display attribute(s) (e.
g., color) of the e-mail message can be changed to bring
45 to a user's attention the increased likelihood of the e-
mail message being spam.
[0046] Next, turning to Fig. 4, a system 400 for detec-
tion of unsolicited e-mail in accordance with an aspect
of the present invention is illustrated. The system 400
50 includes a mail classifier 310, a challenge component
320, spam folder(s) 330 and legitimate e-mail folder(s)
340. The system 400 further includes a legitimate e-mail
sender(s) store 350 and/or a spam sender(s) store 360.
The legitimate e-mail sender(s) store 350 stores infor-
55 mation (e.g., e-mail address) associated with sender(s)
of legitimate e-mail. E-mail message(s) from sender(s)
identified in the legitimate e-mail sender(s) store 350 are
generally not challenged by the challenge component
8
BNSDOCID: <EP 1376427A2_L>
15
EP 1 376 427 A2
16
320. Accordingly, in one example, e-mail message(s)
stored in the spam tolder(s) 330 by the mail classifier
310 are moved to the legitimate mail folder(s) 340 if the
sender of the e-mail message is stored in the legitimate
e-mail sender(s) store 350.
[0047] Information (e.g., e-mail address(es)) can be
stored in the legitimate e-mail sender(s) store 350 based
on user selection (e.g., "do not challenge" particular
sender command), a user's address book, address(es)
to which a user has sent at least a specified number of
e-mail messages and/or by the challenge component
320. For example, once a sender of an e-mail message
has responded correctly to a challenge, the challenge
component 320 can store information associated with
the sender (e.g., e-mail address) in the legitimate e-mail
sender(s) store 350.
[0048] The legitimate e-mail sender(s) store 350 can
further retain a confidence level associated with a send-
er of legitimate e-mail. E-mail message(s) having asso-
ciated probabilities less than or equal to the associated
confidence level of the sender are not challenged by the
challenge component 320 while those e-mail message
(s) having associated probabilities greater than the as-
sociated confidence level are challenged by the chal-
lenge component 320. For example, the confidence lev-
el can be based, at least in part, upon the highest asso-
ciated probability challenge to which the sender has re-
sponded.
[0049] In one implementation, a sender can be re-
moved from the legitimate e-mail sender(s) store 350
based, at least in part, upon a user's action (e.g., e-mail
message from the sender deleted as spam). In accord-
ance with another aspect, sender(s) are added to the
legitimate e-mail sender(s) store 350 after a user has
sent one e-mail message to the sender - this can be use-
ful for mailing list(s).
[0050] The spam sender(s) store 360 stores informa-
tion (e.g., e-mail address) associated with a sender of
spam. Information can be stored in the spam sender(s)
store 360 by a user and/or by the challenge component
320. For example, once a user has deleted a particular
e-mail message as spam, information associated with
the sender of the e-mail message can be stored in the
spam sender(s) store 360. In another example, informa-
tion associated with a sender of an e-mail message that
incorrectly responded to a challenge and/or failed to re-
spond to the challenge can be stored in the spam sender
(s) store 360.
[0051] Fig. 5 illustrates a system 500 for detection of
unsolicited e-mail in accordance with an aspect of the
present invention is illustrated. The system 500 includes
a mail classifier 51 0, a challenge component 520, spam
folder(s) 530, questionable spam folder(s) 540 and le-
gitimate e-mail folder(s) 550. As discussed above, the
mail classifier 51 0 determines the associated probability
that an e-mail message is spam and stores the e-mail
message in the spam folder(s) 530, the questionable
spam f older(s) 540 or the legitimate e-mail fotder(s) 550.
Incoming e-mai! message(s) areapplied to an input of
the mail classifier 510, which, in turn, probabilistically
classifies each of these messages as either legitimate,
questionable spam or spam. Based on its classification,
5 each message is routed to one of the spam folder(s)
530, the questionable spam folder(s) 540 or the legiti-
mate e-mail folder(s) 550.
[0052] E-mail message(s) having associated proba-
bilities less than or equal to a first threshold are in legit-
10 imate e-mail folder(s) 550. Enmail message(s) having
associated probabilities greater than the first threshold,
but less than or equal to a second threshold are stored
in questionable spam folder(s) 540. Further, e-mail mes-
sage^) having associated probabilities greater than the
15 second threshold are stored in spam folder(s) 530. It is
to be appreciated that the first threshold and/or the sec-
ond threshold can be fixed, based on user preference
(s) and/or adaptive (e.g., based, at least in part, upon
available computational resources). Thereafter, the
20 challenge component 520 can send a challenge to a
sender of an e-mail message stored in the questionable
spam folder(s) 540. For example, the challenge can be
based, at least in part, upon a code embedded within
the challenge, a computational challenge, a human
25 challenge and/or a micropayment request.
[0053] Based, at least in part, upon a response to the
challenge or lack thereof, the challenge component 520
can move the e-mail message from the questionable
spam folder(s) 540 to the legitimate e-mail folder(s) 550
30 or the spam folder(s) 530. For example, upon receipt of
an appropriate (e.g., correct) response to the challenge,
the challenge component 520 can moved the e-mail
message from the questionable spam folder(s) 540 to
the legitimate e-mail folder(s) 550.
35 [0054] Further, upon receipt of an inappropriate (e.g.,
incorrect) response to the challenge and/or failure to re-
ceive a response to the challenge in a particular time
period (e.g., 4 hours), the challenge component 520 can
move the e-mail message from the questionable spam
40 folder(s) 540 to the spam folder(s) 530.
[0055] Referring next to Fig. 6, a system 600 for de-
tection of unsolicited e-mail in accordance with an as-
pect of the present invention is illustrated. The system
600 includes a mail classifier 510, a challenge compo-
45 nent 520, spam folder(s) 530, questionable spam folder
(s) 540 and legitimate e-mail folder(s) 550. The system
600 further includes a legitimate e-mail sender(s) store
560 and/or a spam sender(s) store 570.
[0056] The legitimate e-mail sender(s) store 560
50 stores information (e.g., e-mail address) associated with
sender(s) of legitimate e-mail. E-mail message(s) from
entities identified in the legitimate e-mail sender(s) store
560 are generally not challenged by the challenge com-
ponent 520. Accordingly, in one example, e-mail mes-
55 sage(s) stored in the spam folder(s) 530 or the ques-
tionable spam folder(s) 540 by the mail classifier 510
are moved to the legitimate mail folder(s) 550 if the
sender of the e-mail message is stored in the legitimate
9
BNSDOCID: <EP
1376427A2_I_>
17
EP 1 376 427 A2
18
e-mail sender(s) store 560.
[0057] Information (e.g., e-mail address(es)) can be
stored in the legitimate e-mail sender(s) store 660 based
on user selection (e.g., "do not challenge" particular
sender command), a user's address book, address(es)
to which a user has sent at least a specified number of
e-mail messages and/or by the challenge component
520. For example, once a sender of an e-mail message
has responded correctly to a challenge, the challenge
component 520 can store information associated with
the sender (e.g., e-mail address) in the legitimate e-mail
sender(s) store 560.
[0058] The legitimate e-mai! sender(s) store 560 can
further store a confidence level associated with a sender
of legitimate e-mail. For example, e-mail message(s)
having associated probabilities less than or equal to the
associated confidence level of the sender are not chal-
lenged by the challenge component 520 while those e-
mail message(s) having associated probabilities greater
than the associated confidence level are challenged by
the challenge component 520. For example, the confi-
dence level can be based, at least in part, upon the high-
est associated probability challenge to which the sender
has responded.
[0059] In one example, a sender can be removed from
the legitimate e-mail sender(s) store 560 based, at least
in part, upon a user's action (e.g., e-mail message from
the sender deleted as spam), in another example, send-
ees) are added to the legitimate e-mail sender(s) store
560 after a user has sent one e-mail message to the
sender.
[0060] The spam sender(s) store 570 stores informa-
tion (e.g., e-mail address) associated with a sender of
spam. Information can be stored in the spam sender(s)
store 570 by a user and/or by the challenge component
520. For example, once a user has deleted a particular
e-mail message as spam, information associated with
the sender of the e-mail message can be stored in the
spam sender(s) store 570. In another example, informa-
tion associated with a sender of an e-mail message that
incorrectly responded to a challenge and/or failed to re-
spond to the challenge can be stored in the spam sender
(s) store 570.
[0061] In one example, a unique-ID can be ex-
changed during the challenge process {e.g., to reduce
the likelihood that a spammer can send spam using an
address of a good sender). Further, sender(s) can use
message signing. Unsigned message(s) from sender(s)
stored in the legitimate e-mail sender(s) store 560 who
usually sign their message(s) are subjected to the usual
processing and potential challenging.
[0062] In another example, higher volume sender(s)
of e-mail customize their "from" address (e.g., a unique
"from" address for a recipient). For example, the "from"
address can be based on a global secret key known to
the sender and hashed with the recipient's e-mail ad-
dress. Alternatively, a random number can be generated
and stored for a recipient.
[0063] In yet a third exa mnje, ^ "per recipient, IJQ"
(PRID) is included in e-mail message(s). The PRID ap-
pends sender unique information in a special message
header field. It is to be appreciated that the PRID does
5 not have to be set on a pej-sender basjs. Thus, as mail
is forwarded around an organization, inclusion on the
legitimate e-mail sender(s) store 560 can be inherited.
The PRID can be a public key for use with a public key
signature system (e.g., OpenPGP or S/MIME).
w [0064] Additionally, sendees) of e-mail message(s)
can include requests for challenge(s) (e.g., to facilitate
scheduling of receipt of challenge(s)). For example, an
e-mail message(s) can include a
"CHALLENGEJv1E_NOW: TRUE" header. This can
15 cause a system 600 to automatically send a challenge
and when a correct response is received to include the
sender in the legitimate e-mail sender(s) store 560.
[0065] The challenge component 520 can be adapted
to detect e-mail message(s) received from mailing list
20 (s) (e.g., moderated mailing list(s) and/or unmoderated
mailing list(s)). For example, a header line such as
"Precedence: list" or "Precedence: bulk" can be includ-
ed in e-mail message(s) received from a mailing list. In
another example, the challenge component 520 detects
25 that an e-mail message is spam based, at least in part
upon, detection of a "sender" line being different from a
"from" line. E-mail message header(s) typically contains
two different from lines: one "from" line at the top (e.g.,
inserted by the from command used by SMTP), and a
30 "from:" header field (e.g., the one that is usually dis-
played to the user.) For mailing lists, these may differ.
[0066] In one example, the challenge component 520
can detect e-mail message(s) from mailing list(s) and
give a user the opportunity to include the mailing list(s)
35 in the legitimate e-mail sender(s) store 560. The chal-
lenge component 520 can additionally include a level of
confidence associated with the mailing list(s).
[0067] An issue to be addressed with regard to mail-
ing list(s) is to reduce the likelihood that spam-like mes-
40 sage(s) received from a mailing list will create a mail
storm of challenges to the mailing list. This issue differs
for the different list types. There are 8 situations, al-
though many of them share the same solution. In par-
ticular, a mailing list can be can be moderated or un-
45 moderated and additionally can have different level(s)
of ability to respond to challenges. This creates 8 types.
[0068] Many moderated mailing list(s) include an "ap-
proved-by" header. For example, for moderated mailing
list(s), it can be assumed that either all messages are
so good, or all are spam. For unmoderated lists, it can be
assumed that some spam will be sent to the mailing list.
Thus, for an unmoderated mailing list, the challenge
component 520 can allow a user to set a threshold de-
termining whether spam-like messages should be
55 shown, or simply put in the spam folder(s) 530.
[0069] For example, an e-mail message from a mail-
ing list has been detected, a user is given the user the
opportunity to determine the level of confidence associ-
10
BNSDOCID: <EP 1 376427 A2J_>
t9
EP 1 376 427 A2
20
ated with the mailing list. A concern is sending too many
challenges to mailing lists, especially those that do not
have the ability to automatically respond to challenges.
For moderated mailing list(s), for example, a user can
be prompted to include the mailing list in the legitimate 5
e-mail sender(s) store 560. In another example, the
mailing list can respond to a challenge from the chal-
lenge component 520 and be included in the legitimate
e-mail sender(s) store 560. In yet a third example, upon
subscription to the mailing list, the mailing list prompts io
the userto include the mailing list in the user's legitimate
e-mail sender(s) store 560.
[0070] For unmoderated mailing list(s), for example,
a user can be prompted to set a threshold for the mailing
list. E-mail message(s) having a probability of being is
spam above the threshold is moved to the spam folder
(s) 530 and/or deleted. In another example, the mailing
list can respond to a challenge from the challenge com-
ponent 520 and be included in the legitimate e-mail
sender(s) sfore 560. In yet a third example, upon sub- 20
scription to the mailing list, the mailing list prompts the
user to include the mailing list in the user's legitimate e-
mail sender(s) store 560.
[0071] The challenge component 520 can take into
account mailing list(s) that do not have the ability to au- 25
tomatically respond to challenges. In particular, for mod-
erated mailing lists, the challenge component 520 can
include the mailing list in the legitimate e-mail sender(s)
store 560. For unmoderated mailing lists, the challenge
component 520 can facilitate setting a threshold for the 30
mailing list: messages above the threshold are chal-
lenged while messages below the threshold are let
through
[0072] Inclusion in the legitimate e-mail sender(s)
store 560 can occur at an appropriate time. For mailing 35
lists, it is likely that the user will not send mail TO the
list. However, it is undesirable to include the mailing list
in the legitimate e-mail sender(s) store 560 based on
small amounts of mail received FROM the list. Other-
wise a spammer could masquerade as a mailing list, 40
send a small amount of messages (none of which are
deleted as spam) and then send spam freely. In one im-
plementation, the first time that mail from a mailing list
arrives, and is not detected as spam : the user is prompt-
ed to add the mailing list to the legitimate e-mail sender 45
(s) store 560, with an associated threshold. Since most
mailing lists include a welcome message, if some wel-
come messages are included in training data, the wel-
come message is unlikely to be marked as spam.
[0073] If, however, the first messages that arrive are so
substantially all spam-like : then the messages should
be included in the spam folder(s) 530. In particular, it is
not desirable to let someone masquerade as a mailing
list, and send spam. Thus, until the mail listing is includ-
ed in the legitimate e-mail sender(s) store 560, the chal- 55
lenge component 520 can send challenge(s) to the mail-
ing list as described supra. If the messages are spam-
like but legitimate, the user may or may not receive
them, depending on howjhe cjjajjenges are handled. If
the challenges are not answered, they will not get
through. Thus, it should be difficult to get spam through.
Eventually, the mailing list will send a non-spam like
message,_and the user will bejDrompted to establish a
policy for the mailing list.
[0074] It is to be appreciated that mailing list(s) may
have a From address such that mail sent to that From
address is sent to the entire list. If a list appears to be
of that type, it is undesirable tosend challenges to it as
they might be received by substantially all readers of the
mailing list. Apparent spam from such a mailing list be-
fore the mailing list has been included in the legitimate
e-mail sender(s) store 560 can simply be ignored.
The definition of inclusion in the legitimate e-mail sender
(s) store 560 can be modified for mailing list(s). Given
that the From line on a mailing list, even a moderated
one is different for each sender, inclusion in the legiti-
mate e-mail sender(s) store 560 can be based on other
part(s) of the header. Often, the To line on a mailing list
is the mailing list name (so that reply-all goes to the
whole list.). Thus, for mailing lists, inclusion in the legit-
imate e-mail sender(s) store 560 can be based, at least
in part, on the to-line. This can be in addition to frorn-
line listing (e.g., if the sender of the mailing list is in the
legitimate e-mail sender(s) store 560 that also should
be sufficient). It is to be appreciated that other header
lines, for mailing lists, such as sent-by, that can addi-
tionally and/or alternatively be included in the legitimate
e-mail sender(s) store 560. „
[0075] In order to determine validity of e-mail address
(es), spammer(s) rely on "bouncing". Many convention-
al e-mail servers bounce e-mail back to it's sender if it
is addressed to an invalid address. Thus, for e-mail serv-
ers those e-mail servers, the indicia of validity of an e-
mail address increases if an e-mail message is'not
bounced. Accordingly, spammers can send more spam
messages to the unbounced addresses.
[0076] For those e-mail servers which bounce e-mail,
challenges of the present invention do not provide any
additional information to the spammer (e.g., lack of
bounce is an indication of validity of the address). Fur-
ther, the e-mail server can itself send challenges via a
system for detection of unsolicited e-mail for "semi-live"
address(es) (e.g., valid but un monitored address).
[0077] With regard to e-mail servers which do not
bounce e-mail to invalid addresses, again the e-mail
server can itself send challenges via a system for de-
tection of unsolicited e-mail, for example, to have be-
havior of invalid address(es) be similar to the behavior
of valid address(es). Further in one implementation, a
randomization factor is added to the probability that an
e-mail is spam by the server system (e.g., to prevent
attempts to circumvent adaptive spam filters).
[0078] Next, turning to Fig. 7, a system 700 for re-
sponding to a challenge in accordance with an aspect
of the present invention is illustrated. The system 700
includes a challenge receiver component 710, a chal-
11
BNSDOCID:<EP 1376427A2J >
21
EP 1 376 427 A2
lenge processor component 720 and a challenge re-
sponse component 730.
[0079] The challenge receiver component 710 re-
ceives a challenge (e.g., to a previously sent e-mail).
For example the challenge can be based, at least in part,
upon a code embedded within the challenge, a compu-
tational challenge, a human challenge and/or a micro-
payment request.
[0080] In one example, the challenge receiver com-
ponent 71 0 determines which of a plurality of challenge
modalities to forward to the challenge processor com-
ponent 720 (e.g., based on available computational re-
sources and/or user preference). In another example,
the challenge receiver component 710 provides infor-
mation to a user to facilitate selection of one of a plurality
of challenge modalities, thus, allowing a user to select
which modality, if any, the user wishes to use to respond
to the challenge. For example, the challenge receiver
component 710 can provide information which may be
helpful to theTiser in selecting an appropriate response
modality, such as, an amount of computational resourc-
es required to respond to a computational challenge, an
amount of a micropayment and/or a balance of a micro-
payment account. Once a challenge modality has been
selected, the challenge is forwarded to the challenge
processor 720.
[0081] It is to be appreciated that in certain instances
the user may desire to not respond to the challenge, in
which case, no information is sent to the challenge proc-
essor component 720 and/or the challenge response
component 730.
[0082] The challenge processor component 720 proc-
esses the challenge and provides an output associated
with the processed challenge. For example, when the
challenge includes an embedded code, the challenge
processor component 720 can provide an output to the
challenge response component 730 which includes the
embedded code. In the instance in which the challenge
includes a computational challenge, the challenge proc-
essor component 720 can facilitate generation of a so-
lution to the computational challenge.
[0083] When the challenge includes a human chal-
lenge, the challenge processor component 720 can pro-
vide information to a user to facilitate solving the human
challenge. !n one example, the human challenge can
include a problem that is relatively easy for a human to
solve, and relatively hard for a computer. In one exam-
ple, the human challenge includes an image of a word
(e.g., GIF or JPEG). The word is partially obscured by
noise. The noise makes it hard to automatically develop
a computer program to read the word (or at least, to use
0 ff_the-shelf components), without making it too hard for
a human to do it. In this example, the challenge proces-
sor component 720 can provide the image of the word
to the user. The user then provides the word back to the
challenge processor component 720. The challenge
processor component 720 provides an output including
the word to the challenge response component 730.
[0084] When the challengej^u/ies a micropayment _
request, the challenge processor component 720 can
facilitate providing an output to the challenge response
component 730. In one example, a response to a micro-
5 payment challenge is based on a one-time use "spam
certificate" which can be issued by an issuing authority.
The challenge processor component 720 can either au-
tomatically or based on user input provides a spam cer-
tificate number to the challenge response component
10 730. By providing the spam ce&if icate number, the spam
certificate is thereafter invalidated (e.g., one-time use).
[0085] In another example, a response to a micropay-
ment challenge is based on a micropayment account.
Each such response causes an amount to be removed
15 from a micropayment account maintained, for example,
by an issuing authority. The challenge processor com-
ponent 720 can provide information associated'with the
micropayment account to the challenge response com-
ponent 730.
20 [0086] The challenge response component 730 pro-
vides a response to the challenge based, at least in part,
upon the output associated with the processed chal-
lenge. For example, the response to the challenge can
include an embedded code, solution to a computational
25 challenge, solution to a human challenge and/or micro-
payment.
[0087] In one implementation, for example, to reduce
a likelihood of a denial-of-service attack, computational
challenges are ordered by the quantity of challenges al-
so ready processed for a given message. Message(s) with
fewer processed challenge(s) are processed before
message(s) having a greater quantity of processed
challenges are processed (e.g., as computational re-
sources are available). Thus, in the instance in which a
35 message is sent to a mailing list, a recipient could send
computational challenges in an effort to maliciously
cause a denial-of-service attack. However, once one or
more computational challenges are processed for that
message, computational challenges of other messages
40 having less processed challenges are given priority,
thus reducing the likelihood of a denial-of-service.
[0088] In view of the exemplary systems shown and
described above, methodologies that may be imple-
mented in accordance with the present invention will be
45 better appreciated with reference to the flow chart of
Figs. 8, 9, 10 and 11 . While, for purposes of simplicity
of explanation, the methodologies are shown and de-
scribed as a series of blocks, it is to be understood and
appreciated that the present invention is not limited by
50 the order of the blocks, as some blocks may, in accord-
ance with the present invention, occur in different orders
and/or concurrently with other blocks from that shown
and described herein. Moreover, not all illustrated
blocks may be required to implement the methodologies
55 in accordance with the present invention.
[0089] The invention may be described in the general
context of computer-executable instructions, such as . _
program modules, executed by one or more compo-
12
BNSDOCID: <EP.
1 376427 A2_l_>
23
EP 1 376 427 A2
24
nents. Generally, program modules include routines,
programs, objects, data structures, etc. that perform
particular tasks or implement particular abstract data
types. Typically the functionality of the program modules
may be combined or distributed as desired in various
embodiments.
[0090] Turning to Figs. 8 and 9, a method 800 for de-
tecting an unsolicited e-mail message in accordance
with an aspect of the present invention is illustrated. At
804, an e-mail message is received. At 808, a probability
that the e-mail message is spam is determined (e.g., by
a mail classifier).
[0091] At 812, a determination is made as to whether
the sender of the e-mail message is in a legitimate e-
mail sender(s) store. If the determination at 81 2 is YES,
processing continues at 816. If the determination at 81 2
is NO, at 820, a determination is made as to whether
the sender of the e-mail message is in a spam sender
(s) store. If the determination at 820 is YES, processing
continues at"^24. If the determination at 820 is NO, at
828, a determination is made as to whether the proba-
bility that the e-mail message is spam is greater than a
first threshold. If the determination at 828 is NO,
processing continues at 81 6. If the determination at 828
is YES, at 832. one or more challenge(s) are sent to the
sender of the e-mail message.
[0092] At 836, a determination is made as to whether
a response to the challenge(s) has been received. If the
determination at 836 is NO, processing continues at
836. If the determination at 836 is YES : at 840, a deter-
mination is made as to whether the response received
to the challenge is correct. If the determination at 840 is
YES, processing continues at 816. If the determination
at 840 is NO, processing continues at 824.
[0093] At 81 6, the e-mail message is identified as "not
spam" (e.g., placed in legitimate e-mail folder(s) and/or
associated probability decreased). Next, at 844, the
sender of the e-mail message is added to the legitimate
e-mail sender(s) store and no further processing
occurs. .
[0094] At 824, the e-mail message is identified as
spam (e.g., placed in spam folder(s), deleted and/or as-
sociated probability increased). Next, at 848, the sender
of the e-mail message is added to the spam sender(s)
store and no further processing occurs.
[0095] Referring next to Fig. 10, a method 1000 for
responding to a challenge in accordance with an aspect
of the present invention is illustrated. At 1 01 0, an e-mail
message is sent. At 1020, a challenge is received (e.g.,
an embedded code, a computational challenge, a hu-
man challenge and/or a request for a micropayment). At
1030, the challenge is processed. At 1040, a response
to the challenge is sent.
[0096] Next, turning to Fig. 1 1 , a method 1 1 00 for re-
sponding to challenges in accordance with an aspect of
the present invention is illustrated. At 1110, e-mail mes-
sage(s) are sent. At 1120, challenge(s) are received (e.
g. r each challenge having an embedded code, a com-
putational challenge, a humar^cjjallenge and/or a re :
quest for a micropayment). At 1 1 30, the challenge(s) to
be processed are ordered based, at least in part, upon
message(s) with fewer challenge(s) processed before
5 message(s) with more challengers) processed (e.g. : to
reduce denial-of-service attacks). At 1140. the chal-
lenge is processed. At 11 50, a response to the selected
challenge is sent. At 1160, a determination is made as
to whetherthere are more challenge(s) to process. If the
10 determination at 1160 is YES, processing continues at
1130. If the determination at 1160 is NO, no further
processing occurs.
[0097] Turning to Fig. 12, an exemplary user interface
1200 for responding to a plurality of challenges in ac-
15 cordance with an aspect of the present invention is il-
lustrated. In this exemplary user interface, a user is
prompted with the message:
THE E-MAIL MESSAGE YOU SENT HAS BEEN
20 DETECTEDAS POTENTIAL SPAM. UNLESS YOU
CORRECTLY RESPOND TO ONE OF THE CHAL-
LENGES IDENTIFIED BELOW, THE E-MAIL MES-
SAGE MAY BE IDENTIFIED AS SPAM AND/OR
DELETED AS SPAM.
25
[0098] The user is presented with three options: com-
puter computational challenge, human challenge and
micropayment. Based, at least in part, upon the user's
selection, the selected challenge can then be proc-
30 essed. .
[0099] In order to provide additional context for vari-
ous aspects of the present invention, Fig. 13 and the
following discussion are intended to provide a brief, gen-
eral description of a suitable operating environment
35 1310 in which various aspects of the present invention
may be implemented. While the invention is described
in the general context of computer-executable instruc-
tions, such as program modules, executed by one or
more computers or other devices, those skilled in the art
40 will recognize that the invention can also be implement-
ed in combination with other program modules and/or
as a combination of hardware and software. Generally,
however, program modules include routines, programs,
objects, components, data structures, etc. that perform
45 particular tasks or implement particular data types. The
operating environment 1310 is only one example of a
suitable operating environment and is not intended to
suggest any limitation as to the scope of use or func-
tionality of the invention. Other well known computer
50 systems, environments, and/or configurations that may
be suitable for use with the invention include but are not
limited to, personal computers, hand-held or laptop de-
vices, multiprocessor systems, microprocessor-based
systems, programmable consumer electronics, network
55 PCs, minicomputers, mainframe computers, distributed
computing environments that include the above sys-
tems or devices, and the like.
[0100] With reference to Fig. 13, an exemplary envi-
BNSDOCID: <EP
1376427A2J_>
13 Best Available Copy
25
EP 1 376 427 A2
26
ronment 1310 for implementing various aspects of the
invention includes a computer 1312. The computer
1312 includes a processing unit 1314, asystem memory
1316 r and a system bus 1318. The system bus 1318
couples system components including, but not limited
to, the system memory 1316 to the processing unit
1314. The processing unit 1314 can be any of various
available processors. Dual microprocessors and other
multiprocessor architectures also can be employed as
the processing unit 1314.
[0101] The system bus 1318 can be any of several
types of bus structure(s) including the memory bus or
memory controller, a peripheral bus or external bus,
and/or a local bus using any variety of available bus ar-
chitectures including, but not limited to, 13-bit bus, In-
dustrial Standard Architecture (ISA) : Micro-Channel Ar-
chitecture (MSA), Extended ISA (EISA), Intelligent Drive
Electronics (IDE), VESA Local Bus (VLB), Peripheral
Component Interconnect (PCI), Universal Serial Bus
(USB), Advanced Graphics Port (AGP), Personal Com-
puter Memory Card International Association bus (PC-
MCIA), and Small Computer Systems Interface (SCSI).
[0102] The system memory 1316 includes volatile
memory 1320 and nonvolatile memory 1322. The basic
input/output system (BIOS), containing the basic rou-
tines to transfer information between elements within
the computer 1312, such as during start-up, is stored in
nonvolatile memory 1 322. By way of illustration, and not
limitation, nonvolatile memory 1322 can include read
only memory (ROM), programmable ROM (PROM),
electrically programmable ROM (EPROM), electrically
erasable ROM (EEPROM), or flash memory. Volatile
memory 1320 includes random access memory (RAM),
which acts as external cache memory. By way of illus-
tration and not limitation, RAM is available in many
forms such as synchronous RAM (SRAM), dynamic
RAM (DRAM), synchronous DRAM (SDRAM), double
data rate SDRAM (DDR SDRAM), enhanced SDRAM
(ESDRAM), Synchlink DRAM (SLDRAM), and direct
Rambus RAM (DRRAM).
[0103] Computer 1312 also includes removable/non-
removable, volatile/nonvolatile computer storage me-
dia. Fig. 13 illustrates, for example a disk storage 1324.
Disk storage 1 324 includes, but is not limited to, devices
like a magnetic disk drive, floppy disk drive, tape drive,
Jaz drive, Zip drive, LS-100 drive, flash memory card,
or memory stick. In addition, disk storage 1324 can in-
clude storage media separately or in combination with
other storage media including, but not limited to, an op-
tical disk drive such as a compact disk ROM device
(CD-ROM), CD recordable drive (CD-R Drive), CD re-
writable drive (CD-RW Drive) or a digital versatile disk
ROM drive (DVD-ROM). To facilitate connection of the
disk storage devices 1324 to the system bus 1318, a
removable or non-removable interface is typically used
such as interface 1326.
[0104] It is to be appreciated that Fig 13 describes
software that acts as an intermediary between users
and the basic computer resourggs^iescribed in suitafe]e
operating environment 1310. Such software includes an
operating system 1328. Operating system 1328, which
can be stored on disk storage 1324, acts to control and
5 allocate resources of the computer system 1312. Sys-__
tern applications 1330 take advantage of the manage-
ment of resources by operating system 1328 through
program modules 1332 and program data 1334 stored
either in system memory 1 31 6 or on disk storage 1 324.
10 it is to be appreciated that the^present invention can be
implemented with various operating systems or combi-
nations of operating systems.
[0105] A user enters commands or information into
the computer 12 through input device(s) 1336. Input de-
15 vices 1 336 include, but are not limited to, a pointing de-
vice such as a mouse, trackball, stylus, touch pad, key-
board, microphone, joystick, game pad, satellite dish,
scanner, TV tuner card, digital camera, digital video
camera, web camera, and the like. These and other in-
20 put devices connect to the processing unit 1 31 4 through
the system bus 1318 via interface port(s) 1338. Interface
port(s) 1338 include, for example, aserialport, a parallel
port, a game port, and a universal serial bus (USB) . Out-
put device(s) 1 340 use some of the same type of ports
25 as input device(s) 1336. Thus, for example, a USB port
may be used to provide input to computer 1 312, and to
output information from computer 1312 to an output de-
vice 1340. Output adapter 1342 is provided to illustrate
that there are some output devices 1340 like monitors,
30 speakers, and printers among other output devices
1 340 that require special adapters. The output adapters
1342 include, by way of illustration and not limitation,
video and sound cards that provide a means of connec-
tion between the output device 1 340 and the system bus
35 1 31 8. It should be noted that other devices and/or sys-
tems of devices provide both input and output capabili-
ties such as remote computer(s) 1344.
[0106] Computer 1312 can operate in a networked
environment using logical connections to one or more
40 remote computers, such as remote computer(s) 1344.
The remote computer(s) 1 344 can be a personal com-
puter, a server, a router, a network PC, a workstation, a
microprocessor based appliance, a peer device or other
common network node and the like, and typically in-
45 eludes many or all of the elements described relative to
computer 1 312. For purposes of brevity, only a memory
storage device 1346 is illustrated with remote computer
(s) 1344. Remote computer(s) 1344 is logically connect-
ed to computer 1312 through a network interface 1348
50 and then physically connected via communication con-
nection 1350. Network interface 1348 encompasses
communication networks such as local-area networks
(LAN) and wide-area networks (WAN). LAN technolo-
gies include Fiber Distributed Data Interface (FDDI),
55 Copper Distributed Data Interface (CDDI), Ethernet/
IEEE 1302. 3 : Token Ring/IEEE 1302.5 and the like.
WAN technologies include, but are not limited to, point-
to-point links, circuit switching networks like Integrated
BNSDCCID: <EP 1 376427 A2 J _>
27
EP 1 376 427 A2
28
Services Digital Networks (ISDN) and variations there-
on, packet switching networks, and Digital Subscriber
Lines (DSL).
[0107] Communication connection(s) 1350 refers to
the hardware/software employed to connect the net-
work interface 1 348 to the bus 1 31 8. While communica-
tion connection 1350 is shown for illustrative clarity in-
side computer 1 312, it can also be external to computer
1 31 2. The hardware/software necessary for connection
to the network interface 1348 includes, for exemplary
purposes only, internal and external technologies such
as, modems including regular telephone grade mo-
dems, cable modems and DSL modems, ISDN adapt-
ers, and Ethernet cards.
[0108] What has been described above includes ex-
amples of the present invention. It is : of course, not pos-
sible to describe every conceivable combination of com-
ponents or methodologies for purposes of describing
the present invention, but one of ordinary skill in the art
may recogni?e that many further combinations and per-
mutations of the present invention are possible. Accord-
ingly, the present invention is intended to embrace all
such alterations, modifications and variations that fall
within the spirit and scope of the appended claims. Fur-
thermore, to the extent that the term "includes" is used
in eitherthe detailed description orthe claims, such term
is intended to be inclusive in amannersimilartotheterm
"comprising" as "comprising" is interpreted when em-
ployed as a transitional word in a claim.
10
15
20
25
30
5. The system of claim J , t haxha llenge being a com-
putational challenge.
6. The system of claim 5, the computational challenge
being a one-way hash ofjhe message including
time stamp and recipient stamp.
7. The system of claim 1 . the challenge being a human
challenge.
8. The system of claim 1 , the challenge being a micro-
payment request.
9. The system of claim 1 , a user being given a choice
of challenges, the choice of challenges being based
upon a filter.
10. The system of claim 1 , a difficulty of the challenge
being based, at least in part, upon the associated
probability that the e-mail message is spam.
11. A system facilitating detection of unsolicited mes-
sages, comprising:
a mail classifier that receives an incoming mes-
sage and classifies the incoming message as
spam or a legitimate message; and,
a challenge component that sends a challenge
to a sender of the message if the message is
classified as spam. ..
Claims
1 . A system facilitating detection of unsolicited e-mail,
comprising:
an e-mail component that receives or stores
messages and receives or computes associat-
ed probabilities that the e-mail messages are
spam; and,
a challenge component that sends a challenge
to an originator of an e-mail message having
an associated probability greater than a first
threshold.
2. The system of claim 1 , further comprising a mail
classifier that receives e-mail messages and deter-
mines the associated probability that the e-mail
message is spam.
3. The system of claim 1 , the challenge component
further modifying the associated probability that the
e-mail message is spam based, at least in part, up-
on a response to the challenge.
4. The system of claim 1 , the challenge being an em-
bedded code.
35
40
12. The system of claim 11, the mail classifier further
storing the incoming message in a spam folder or a
legitimate message folder.
13. The system of claim 12, the challenge component
further moving the message from the spam folder
to the legitimate message folder based, at least in
part, upon a response to the challenge.
14. The system of claim 11 , the challenge being an em-
bedded code.
15. The system of claim 11 , the challenge being a com-
45 putational challenge.
16. The system of claim 11, the challenge being a hu-
man challenge.
50 17. The system of claim 11 , the challenge being a mi-
cropayment request.
18. The system of claim 11 , further comprising a legiti-
mate message sender(s) store that stores informa-
55 tion associated with a sender of legitimate message
(s).
19. The system of claim 18, the challenge component
15
Best Available Copy
BNSDOClD: <EP_
.1 376427 A2_l_>
29
EP 1 376 427 A2
30
adding information associated with the sender of
the message to the legitimate message sender(s)
store, if the challenge is responded to correctly.
20. The system of claim 11 , further comprising a spam
sender(s) store that stores information associated
with a sender of spam.
21 . A system facilitating detection of unsolicited e-mail,
comprising:
a mail classifier that receives an incoming e-
mail message and classifies the incoming e-
mail message as spam, questionable spam or
legitimate e-mail; and,
a challenge component that sends a challenge
to a sender of an e-mail message classified as
questionable spam.
22. The system of claim 21 , the mail classifier further
storing the incoming e-mail message in a spam fold-
er, a questionable spam or a legitimate mail folder.
23. The system of claim 22, the challenge component
further moving the e-mail message from the ques-
tionable spam folder to the spam folder or the legit-
imate mail folder based, at least in part, upon a re-
sponse to the challenge.
24. The system of claim 21 , the challenge being at least
one of an embedded code, a computational chal-
lenge, a human challenge and a micropayment re-
quest.
25. The system of claim 21 further comprising a legiti-
mate e-mail sender(s) store that stores information
associated with a sender of legitimate e-mail.
26. The system of claim 21 , further comprising a spam
sender(s) store that stores information associated
with a sender of spam.
27. The system of claim 21 , the e-mail message includ-
ing a per recipient ID.
28. The system of claim 21 , the challenge component
further adapted to detect whether the e-mail mes-
sage is from a mailing list.
29. The system of claim 28, the challenge component
further adapted to detect whether the mailing list is
moderated or unmoderated.
30. A method for detecting unsolicited e-mail, compris-
ing:
sending a challenge to a sender of an e-mail
message classified as questionable spam;
75
20
receiving a responsejgj^e challenge; and,^ .
modifying the classification of the e-mail mes-
sage based, at least in part, upon the response
to the challenge.
5 _ . ,.. •
31 . The method of claim 30, further comprising at least
one of the following acts, receiving the e-mail mes-
sage;
classifying the e-mail message as spam,
10 questionable spam or legitimate e-mail;
determining whether the sender is stored in a
legitimate e-mail sender(s) store: and,
determining whether the sender is in a spam
sender(s) store.
32. The method of claim 30, the challenge being at least
one of an embedded code, a computational chal-
lenge, a human challenge and a micropayment re-
quest.
33. A method for responding to e-mail challenges, com-
prising:
receiving challenges to e-mail messages;
25 ordering the challenges based, at least in part,
upon a message with fewer challenges proc-
essed before a message with more challenges;
processing the challenge of the message with
fewer challenges; and,
30 sending a response to the challenge of the
message with fewer challenges.
34. A data packet transmitted between two or more
computer components that facilitates unsolicited e-
35 mail detection, the data packet comprising:
a data field comprising information associated
with a challenge, the challenge being based, at
least in part, upon an associated probability
40 that an e-mail message is spam.
35. A computer readable medium storing computer ex-
ecutable components of a system facilitating detec-
tion of unsolicited e-mail, comprising:
45
a mail classifier component that receives e-mail
messages and determines an associated prob-
ability that the e-mail message is spam; and,
a challenge component that sends a challenge
50 to a sender of an e-mail message having an as-
sociated probability greater than a first thresh-
old.
36. A system facilitating detection of unsolicited e-mail,
55 comprising:
means for determining an associated probabil-
ity that an e-mail message is spam; and,
16
BNSDOCID: <EP 1 376427 A2J_>
31 EP 1 376 427 A2 32
means for sending a challenge to a sender of
an e-mail message having an associated prob-
ability greater than a first threshold.
BNSDOCID: <EP 1376427A2_I_>
10
15
20
25
30
35
40
45
50
55
Best Available Copy
17
EP 1 376 427 A2
E-MAIL COMPONENT
i
CHALLENGE
COMPONENT
120
100
110
CHALLENGE(S)
RESPONSE(S) TO
CHALLENGE(S)
FIG. 1
INCOMING E-MAIL
MESSAGE(S)
130
200
E-MAIL COMPONENT
110
120
CHALLENGE(S)
RESPONSE(S) TO
CHALLENGE(S)
FIG. 2
- ; .
18
1 376427 A2 I >
EP 1 376 427 A2
300
INCOMING E-MAIL
MESSAGE(S)
310
330
340
SPAM FOLDER(S)
LEGITIMATE E-MAIL
FOLDER(S)
CHALLENGE
COMPONENT
CHALLENGE(S)
320
RESPONSE(S) TO
CHALLENGE(S)
FIG. 3
Best Available Copy
19
BNSDOCID: <EP 1 376427 A2_l_>
EP 1 376 427 A2
310
400
INCOMING E-MAIL
MESSAGE(S)
MAIL CLASSIFIER
SPAM
FOLDER(S)
LEGITIMATE
E-MAIL
FOLDER(S)
330
340
CHALLENGE(S)
CHALLENGE
COMPONENT
320
LEGITIMATE
E-MAIL
SENDER(S)
RESPONSE(S) TO
CHALLENGE(S)
SPAM
SENDER(S)
STORE
360
350
FIG. 4
hi
20
BNSDOCID: <EP
1 376427 A2_l_>
EP 1 376 427 A2
500
INCOMING E-MAIL
MESSAGE(S)
SPAM
FOLDER(S)
530
MAIL CLASSIFIER
QUESTIONABLE
SPAM
FOLDER(S)
540
CHALLENGE
COMPONENT
510
520
LEGITIMATE
E-MAIL
FOLDER(S)
CHALLENGE(S)
RESPONSE(S) TO
CH ALLEN G E(S)
FIG. 5
Best Available Copy
21
BNSDOCID: <EP 1376427A2 I >
EP 1 376 427 A2
600
INCOMING E-MAIL
MESSAGE(S)
MAIL CLASSIFIER
510
SPAM
FOLDER(S)
QUESTIONABLE
SPAM
FOLDER(S)
LEGITIMATE
E-MAIL
FOLDER(S)
530
540
550
CHALLENGE
COMPONENT
CHALLENGE(S)
LEGITIMATE
E-MAIL
SENDER(S)
RESPONSE(S) TO
CHALLENGE(S)
SPAM
SENDER(S)
560
570
FIG. 6
22
FMMSDOCID <EP _ 1 376427 A2J_>
1*
EP 1 376 427 A2
CHALLENGE
700
CHALLENGE
RECEIVER
COMPONENT
710
*
CHALLENGE
PROCESSOR
COMPONENT
720
i
r
j
•
CHALLENGE
RESPONSE
COMPONENT
RESPONSE TO
CHALLENGE
730
FIG. 7
23
Best Available Copy
BNSDOCID: <EP
.1376427A2_I_>
EP 1 376 427 A2
800
RECEIVE E-MAIL MESSAGE
!
r
804
DETERMINE PROBABILITY THAT E-MAIL
MESSAGE IS SPAM
808
TO
FIG. 9
FIG. 8
» *
. » —
24
1 376427 A2J_>
V
EP 1 376 427 A2
FROM
FIG. 8
FROM
FIG. 8
800
E-MAIL MESSAGE IS
IDENTIFIED AS SPAM
I
816
824
ADD SENDER TO SPAM
SENDER(S) STORE
848
844
c
I
END
FROM
FIG. 8
E-MAIL MESSAGE IS
IDENTIFIED AS "NOT
SPAM"
I
ADD SENDER TO
LEGITIMATE E-MAIL
SENDER(S) STORE
FIG. 9
25
Best Available Copy
BNSDOCID: <EP 1 376427 A2J_>
EP 1 376 427 A2
C
START
)
RECEIVE CHALLENGE
SEND RESPONSE TO CHALLENGE
c
I
END
1000
1010
1020
1030
1040
FIG. 10
• ; . I'
26
EP 1 376 427 A2
C
START
1100
SEND E-MAIL MESSAGE(S)
1110
RECEIVE CHALLENGE(S)
1120
1
ORDER CHALLENGE(S) TO BE PROCESSED BASED,
AT LEAST IN PART, UPON MESSAGE(S) WITH FEWER
CHALLENGE(S) PROCESSED BEFORE MESSAGE(S)
WITH MORE CHALLENGES PROCESSED
1160
SEND RESP
ONSE TO S
ELECTED CHALLENGE
1130
1140
1150
MORE CHALLENGE(S) TO
PROCESS?
NO
c
END
FIG. 11
BNSDOCID: <EP
1376427A2_I_>
Best Available Copy
27
\
EP 1 376 427 A2
1200
THE E-MAIL MESSAGE YOU SENT HAS BEEN DETECTED AS
POTENTIAL SPAM. UNLESS YOU CORRECTLY RESPOND TCT ON E
OF THE CHALLENGES IDENTIFIED BELOW, THE E-MAIL MESSAGE
"* MAY BE IDENTIFIED AS SPAM AND/OR DELETED Ad SPAM.
COMPUTER
COMPUTATIONAL
CHALLENGE
HUMAN
CHALLENGE
FIG. 12
28
BNSDOCID: <EP 1 376427 A2J_>
*
EP 1 376 427 A2
I
Processing
I u "it |
1316
System \
Memory
Volatile
Non VolatileJ ^
1328
j Operating System
................ ......
/rrrr: - 1330
Applications j
1332
I Modules :
1310
1334
Data i
1314
1320
1342
Output
Adapter(s)
1
1338
Interface
Port(s)
3*
1318
CO
1326
- >
Disk
Storage
1350
Communication
Connection(s)
1324
1344
1312
Output
Device(s)
1340
Input '
Device(s)
^ 1336
r
Network
Interface
I
1348
Remote
Computer(s)
Memory
Storage
1346
FIG. 13
Best Available Copy
29
BNSDOCID: <EP
.1376427A2_I_>
THIS PAGE BLANK (USPTO)
(19)
Europaisches Patentamt
European Patent Office
Office europeen des brevets
(11)
427 #3
(12)
EUROPEAN PATENT APPLICATION
(88) Date of publication A3:
31.03.2004 Bulletin 2004/14
(51) lntCl7: G06F 17/60
(43) Date of publication A2:
02.01.2004 Bulletin 2004/01
(21) Application number: 03006814.2
(22) Date of filing: 26.03.2003
(84) Designated Contracting States:
AT BE BG CH CY CZ DE DK EE ES Fl FR GB GR
HU IE IT LI LU MC NL PT RO SE SI SK TR
Designed Extension States:
AL LT LV MK
(30) Priority: 26.06.2002 US 180565
(71) Applicant: MICROSOFT CORPORATION
Redmond, Washington 98052-6399 (US)
(72) Inventors:
• Goodman, Joshua Theodore
Redmond, Washington 98052 (US)
• Rounthwaite, Robert L.
Fall City, Washington 98024 (US)
(74) Representative: Grunecker, Kinkeldey,
Stockmair & Schwanhausser Anwaltssozietat
Maximilianstrasse 58
80538 Munchen (DE)
(54) SPAM detector with challenges
(57) A system and method facilitating detection of
unsolicited e-mail message(s) with challenges is provid-
ed. The invention includes an e-mail component and a
challenge component. The system can receive e-mail
message(s) and associated probabilities that the e-mail
message(s) are spam. Based, at least in part, upon the
associated probability, the system can send a challenge
to a sender of an e-mail message. The challenge can
be an embedded code, computational challenge, hu-
man challenge and/or micropayment request. Based, at
;east in part, upon a response to the challenge (or lack
of response), the challenge component can modify the
associated probability and/or delete the e-mail mes-
sage.
CO
<
CM
CO
co
LU
E-MAIL COMPONENT
CHALLENGE
COMPONENT
120
100
110
CHALLENGE(S)
RESPONSE(S) TO
CHALLENGE(S)
FIG. 1
CD
O
g:
CI
o
Printed by Jouve, 75001 PARIS (FR)
BNSDOCID: <EP
1376427A3J_>
EP 1 376 427 A3
European Patent
Office
EUROPEAN SEARCH REPORT
Application Number
~ EP— €3 00 6814
DOCUMENTS CONSIDERED TO BE RELEVANT
Citation of document with indication, where appropriate,
of relevant passages
Relevant
to daim
CLASSIFICATION OF THE
APPLICATION (int.Cr.7)
8
ET AL)
D Y | US 6 161 130 A (HECKERMAN DAVID E
12 December 2000 (2000-12-12)
* abstract *
* column 1, line 10 - line 14 *
* column 4, line 40 - line 49 *
* column 8, line 40 - line 66 *
* claim 1 *
W0 99 10817 A (COBB CHRISTOPHER ALAN)
4 March 1999 (1999-03-04)
* abstract *
* page 3, line 4 - line 17 *
* page 7, line 19 - page 8, line 10 *
* page 16, 1 ine 19 - 1 ine 23 *
* page 11, line 18 - line 21 *
* page 28, line 27 - line 30 *
* figure 7 *
US 6 112 227 A (HEINER JEFFREY NELSON)
29 August 2000 (2000-08-29)
* abstract *
* column 2, line 1 - line 10 *
* column 3, 1 ine 58 - column 4 , line 1
* column 4, line 15 - line 30 *
* figure 2 *
JULIAN BYRNE: "MY Spamblock"
NEWSGROUP CITATION, 'Online!
19 January 1997 (1997-01-19), XP002267503
news . admi n . net-abuse . emai 1
Retrieved from the Internet:
<URL • http: //www . googl e . com/groups?hl =en&l r
=&i e=UTF-8&oe=UTF-8&sel m=32ElA4FD . 41C6%40a
ny.where&rnum=l> 'retrieved on 2004-01-20!
* the whole document *
-/—
1-36
606F17/60
1-36
il-36
TECHNICAL FIELDS
SEARCHED (lnt.Ct.7)
G06F
1-36
The present search report has been drawn up tor alt claims
Place ol search
THE HAGUE
Da»e ol com p lei ion d the search
21 January 2004
Examiner
Rossi er, T
CATEGORY OF CITED DOCUMENTS
X : particularly relevant it taken alone
Y . particularly relevant it combined with another
document ot the same category
A : technological background
O : non-wrrtlen disclosure
p : intermediate document
T : theory or principle underlying the invention
E : earlier patent document, but published on. or
after the filing dale
D document cited in the application
L : document cited for other teasons
& " memMr ot trie same patent family, corresponding
document
BNSDOCID <EP_ 1376427A3_I_>
EP 1 376 427 A3
European Patent
Office
EUROPEAN SEARCH REPORT
Application Number
EP 03 00 6814
DOCUMENTS CONSIDERED TO BE RELEVANT
Category
Citation of document with indication, where appropriate,
of relevant passages
Relevant
to claim
CLASSIFICATION OF THE
APPLICATION (lnLCI.7)
DAVID SKOLL: "How to make sure a human is
sending you mail"
NEWSGROUP CITATION, 'Online!
17 November 1997 (1997-11-17), XP002267504
news . admi n . net-abuse . usenet
Retrieved from the Internet:
<URL : http : / /groups . goog 1 e . ca/gr oups?hl =en&
lr=&ie=UTF-8&oe=UTF-8&selm=561uge%246on%40
bertrand . ccs . carleton . ca>
'retrieved on 2004-01-20!
* the whole document *
1-36
TECHNICAL FIELDS
SEARCHED <lnt.CI7)
The present search report has been drawn up lor all claims
P*ac* of searen
Cate en compfelor. o' tn«= sesro-i
Exam.net
THE HAGUE
21 January 2004
Rossier, T
c
o
i
CD
in
5
C
o
o
CL
CATEGORY OF CITED DOCUMENTS
X : particular!/ relevant if taken alone
Y : particularly relevant If combined with another
Oocumem of lire same category
A : technological background
O : non- written disclosure
P : intermediate document
T : t he-Dry or principle underlying the invention
E : earlier patent document, du! published on, or
after the filing dale
D : document cited in the application
L : document cited lor other reasons
a : member of the same patent family, corresponding
document
Best Available Copy
BNSDOCID: <EP
1376427A3 I >
EP 1 376 427 A3
ANNEX TO THE EUROPEAN SEARCH REPORT
ON EUROPEAN PATENT APPLICATION NO.
EP 03 00 6814
This anriex
The
The European
M, the patent rami* members retewg to the patem^merts
cited in the above-mentioned European search report.
Ambers are as CKtfned ^ffS^S * * « mere.y g „en to, the purpose o. in.crmat.on
Paten: document
cited in search report
US 6161130
W0 9910817
Publication
date
Patent family
members)
EP
1090368 Al
WO
9967731 Al
US
6199102 Bl
wo
9910817 Al
21-01-2004
Publication
date
11-04-2001
29-12-1999
06-03-2001
04-03-1999
US.6112227
29-08-2000 NONE
I For more dettfs about this annex : see Official Joumri of the European Patent Office, No. 12/82
BNSDOCID <EP.
1376427 A3 J _>